Assignment 2. Visualization of income in Swedish households

Download data, Disposable income for households by region, type of households and age, for three different group ages:
18-29
30-49
50-64
For year 2016.


We read data of swedish counties from the map into json.
We also change the data Disposable income for households by region, type of households and age on the way that age groups are shown in different columns named data.
18-29: Young
30-49: Adult
50-64: Senior

The head of processed data:

## Warning: package 'plotly' was built under R version 4.4.2
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
##         region Young Adult Senior
## 1    Stockholm 418.9 716.4  742.5
## 2      Uppsala 327.6 589.9  631.1
## 3 Södermanland 346.0 532.0  551.6
## 4 Östergötland 316.0 546.4  579.4
## 5    Jönköping 359.6 563.7  605.0
## 6    Kronoberg 334.4 546.9  577.2

Create a plot in Plotly containing three violin plots showing mean income distributions per age group.

Here we plot two different data :Data and Processed data
We can see difference in the xtitle of the maps.

We can see the big difference between the income range of young compare to adult and senior.
It seems that the income increase between the level young , adult and senior.  But the differences for income of young people with others is a lot.
The median for young people is 334, for adult is 531 and for Senior is 551.
Also the range of income change in young employee is less than others so the income differences when people are adult or senior is more possible.


part3

plot in Plotly showing dependence of Senior incomes on Adult and Young incomes in various counties.

The surface plot:

## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## The following object is masked from 'package:plotly':
## 
##     select
## Warning: package 'akima' was built under R version 4.4.2

Here we can see Positive Trend Senior income increase as young and adult increase.

Linear regression seems to be reasonable as the surface follow a linear pattern.


First we print the json data properties and we see that counties is in properties.NAME_1.
scope = europe

## $GID_1
## [1] "SWE.2_1"
## 
## $GID_0
## [1] "SWE"
## 
## $COUNTRY
## [1] "Sweden"
## 
## $NAME_1
## [1] "Dalarna"
## 
## $VARNAME_1
## [1] "Dalecarlia|Kopparberg"
## 
## $NL_NAME_1
## [1] "NA"
## 
## $TYPE_1
## [1] "Län"
## 
## $ENGTYPE_1
## [1] "County"
## 
## $CC_1
## [1] "NA"
## 
## $HASC_1
## [1] "SE.KO"
## 
## $ISO_1
## [1] "NA"

Incomes of Young

Incomes of Adult

By looking at the map we can find the cluster of regions that are similar (which is not seen in previous statistical plot). We can see that the mean of income in North of Sweden is less than south.
Find unusual regions (compared to neighbor regions) and patterns: Here, we see Stockholm is the best place to work for all two age groups. Added to that for young people is not a bad choice to o to the north for working, but for Adult there is no big difference between the north and middle and the best choice is south of Sweden.


part5

Add a red dot to the choropleth map for Young from step 4 in order to show where we are located:

We find latitude,longitude of Linkoping the the site, http://www.gpsvisualizer.com/geocoder/ :

latitude,longitude,name,desc,color,source,precision
58.41109,15.62565,Linkoping,“Linkoping, Linkoping, E

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

As we can see, regarding to the dataset the average income for Young people is in Östergötland is in lowest place compare to other part of Sweden.

Apendix

knitr::opts_chunk$set(echo = TRUE)

# Assignment 2 
#Part1 : 
#Download map 

library(rjson)
library(plotly)
library(dplyr)
library(ggplot2)
#library(MASS)
library(readr)
library(plotly)

json<-fromJSON(file="gadm41_SWE_1.json")
df<-read.csv("000006SW_20230919-130923.csv", fileEncoding="latin1")

#Read your data into R and process it in such a way
#that different age groups are shown in
#different columns. Let’s call these groups Young,
#Adult and Senior

data <- df %>%
  filter(age %in% c("18-29 years"))%>%
  dplyr::select(region,X2016)%>%
  dplyr::rename(Young = X2016)

data<-mutate(data,df %>%
               filter(age %in% c("30-49 years"))%>%dplyr::select( X2016)%>%
               dplyr::rename(Adult = X2016))%>%mutate(data,df %>%
                                                  filter(age %in% c("50-64 years"))%>%dplyr::select( X2016)%>% 
                                                  dplyr::rename(Senior = X2016))

head(data)
#Part2
#Create a plot in Plotly containing three violin plots 
#showing mean income

#plot_ly by default data 
p3<-plot_ly(df, x=~factor(age), y=~X2016, split=~factor(age),
            type="violin", box=list(visible=T))%>% layout(yaxis=list(title="Mean income distributions"),
                                                                     xaxis=list(title="Age Group"))
p3


#plot_ly by Processed data 
p4<-plot_ly(data, type = "violin", box = list(visible = TRUE)) %>%
  add_trace(y = ~Young, name = "Young") %>%
  add_trace(y = ~Adult, name = "Adult") %>%
  add_trace(y = ~Senior, name = "Senior") %>%
  layout(title = "Mean Income Distribution by processed data",yaxis=list(title="Mean income distributions"),
         xaxis=list(title="Age Group"))

p4


#part3
library(dplyr)
library(ggplot2)
library(MASS)

library(plotly)

#data %>%plot_ly(x=~Young, y=~Adult, z=~Senior, type="scatter3d")
#data %>%plot_ly(x=~Young, y=~Adult, z=~Senior, type="contour")


library(akima)
attach(data)
s=interp(Young,Adult,Senior, duplicate = "mean")
detach(data)

plot_ly(x=~s$x, y=~s$y, z=~s$z, type="surface") %>%
  layout(scene = list(
    xaxis = list(title = "Young Income"),
    yaxis = list(title = "Adult Income"),
    zaxis = list(title = "Senior Income")
  ),
  title = "Dependence of Senior Incomes on Adult and Young Incomes by Region")


#part4
print(json$features[[2]]$properties)
#plot
g <- list(fitbounds = "location", visible = FALSE,scope = 'europe')
p1 <- plot_geo(data) %>% add_trace(
  type = "choropleth",
  geojson = json,
  locations = ~region,
  z = ~Young,
  featureidkey = "properties.NAME_1") %>%
  layout(geo = g, title = "Young Incomes by Region")

p1
p2 <- plot_geo(data) %>% add_trace(
  type = "choropleth",
  geojson = json,
  locations = ~region,
  z = ~Adult,
  featureidkey = "properties.NAME_1") %>%
  layout(geo = g, title = "Adult Incomes by Region")

p2


##ccoordinates of linkoping
#latitude,longitude,name,desc,color,source,precision
#58.41109,15.62565,Linkoping,"Linkoping, Linkoping, E, SE",,MapQuest,city/town
lat=58.41109
long=15.62565

p1 <- plot_geo(data) %>% add_trace(
  type = "choropleth",
  geojson = json,
  locations = ~region,
  z = ~Young,
  featureidkey = "properties.NAME_1") %>%
  layout(geo = g,title = "Adult Incomes by Region, Red dot is Linkoping ") %>%
  add_markers(
    x = ~long,
    y = ~lat,
    color = "red",
    size=2
  )
  

p1